Data exploration in phylogenetic inference: scientific, heuristic, or neither
نویسندگان
چکیده
The methods of data exploration have become the centerpiece of phylogenetic inference, but without the scientific importance of those methods having been identified. We examine in some detail the procedures and justifications of Wheeler s sensitivity analysis and relative rate comparison (saturation analysis). In addition, we review methods designed to explore evidential decisiveness, clade stability, transformation series additivity, methodological concordance, sensitivity to prior probabilities (Bayesian analysis), skewness, computer-intensive tests, long-branch attraction, model assumptions (likelihood ratio test), sensitivity to amount of data, polymorphism, clade concordance index, character compatibility, partitioned analysis, spectral analysis, relative apparent synapomorphy analysis, and congruence with a ‘‘known’’ phylogeny. In our review, we consider a method to be scientific if it performs empirical tests, i.e., if it applies empirical data that could potentially refute the hypothesis of interest. Methods that do not perform tests, and therefore are not scientific, may nonetheless be heuristic in the scientific enterprise if they point to more weakly or ambiguously corroborated hypotheses, such propositions being more easily refuted than those that have been more severely tested and are more strongly corroborated. Based on common usage, data exploration in phylogenetics is accomplished by any method that performs sensitivity or quality analysis. Sensitivity analysis evaluates the responsiveness of results to variation or errors in parameter values and assumptions. Sensitivity analysis is generally interpreted as providing a measure of support, where conclusions that are insensitive (robust, stable) to perturbations are judged to be accurate, probable, or reliable. As an alternative to that verificationist concept, we define support objectively as the degree to which critical evidence refutes competing hypotheses. As such, degree of support is secondary to the scientific optimality criterion of maximizing explanatory power. Quality analyses purport to distinguish good, reliable, accurate data from bad, misleading, erroneous data, thereby assessing the ability of data to indicate the true phylogeny. Only the quality analysis of character compatibility can be judged scientific—and a weak test at that compared to character congruence. Methods judged to be heuristic include Bremer support, long-branch extraction, and safe taxonomic reduction, and we underscore the great heuristic potential of a posteriori analysis of patterns of transformations on the total-evidence cladogram. However, of the more than 20 kinds of data exploration methods evaluated, the vast majority is neither scientific nor heuristic. Given so little demonstrated cognitive worth, we conclude that undue emphasis has been placed on data exploration in phylogenetic inference, and we urge phylogeneticists to consider more carefully the relevance of the methods that they employ.
منابع مشابه
PsodaScript: Applying Advanced Language Constructs to Open-source Phylogenetic Search
Due to the immensity of phylogenetic tree space for large data sets, researches must rely on heuristic searches to infer reasonable phylogenies. By designing meta-searches which appropriately combine a variety of heuristics and parameter settings, researchers can significantly improve the performance of heuristic searches. Advanced language constructs in the open-source PSODA project—including ...
متن کاملAlgorithms in supertree inference and phylogenetic data mining
Science and society would benefit enormously from comprehensive phylogenetic knowledge of the Tree of Life (ToL), a framework that includes above 1.7 million species on Earth. ToL shows how living things have evolved since the origins of life billions of years ago. With existing computational approaches, we cannot produce a global estimate of evolutionary history from molecular data of all thes...
متن کاملA genetic algorithm for maximum-likelihood phylogeny inference using nucleotide sequence data.
Phylogeny reconstruction is a difficult computational problem, because the number of possible solutions increases with the number of included taxa. For example, for only 14 taxa, there are more than seven trillion possible unrooted phylogenetic trees. For this reason, phylogenetic inference methods commonly use clustering algorithms (e.g., the neighbor-joining method) or heuristic search strate...
متن کاملBayesian support is larger than bootstrap support in phylogenetic inference: a mathematical argument.
In phylogenetic inference, the support of an estimated phylogenetic tree topology and its interior branches is usually measured either with non-parametric bootstrap support (BS) values or with Bayesian posterior probabilities (BPPs). Extensive empirical evidence indicates that BPP values are systematically larger than BS when measured on the same data set, but there are no theoretical results s...
متن کاملAdaptive Neuro-Fuzzy Inference System application for hydrothermal alteration mapping using ASTER data
The main problem associated with the traditional approach to image classification for the mapping of hydrothermal alteration is that materials not associated with hydrothermal alteration may be erroneously classified as hydrothermally altered due to the similar spectral properties of altered and unaltered minerals. The major objective of this paper is to investigate the potential of a neuro-fuz...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003